Four basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences
نویسندگان
چکیده
Coding information is the main source of heterogeneity (non-randomness) in the sequences of bacterial genomes. This information can be naturally modeled by analysing cluster structures in the “in-phase” triplet distributions of relatively short genomic fragments (200-400bp). We found a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties. The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We demonstrated that there are four basic “pure” types of this model, observed in nature: “parallel triangles”, “perpendicular triangles”, degenerated case and the flower-like type. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy (more precisely, by two similar functions, one for eubacterial genomes and the other one for archaea). All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site: http://www.ihes.fr/∼zinovyev/7clusters. The finding can be readily introduced into any software for gene prediction, sequence alignment or bacterial genomes classification.
منابع مشابه
Universal Seven-cluster Structure of Genome Fragment Distribution: Basic Symmetry in Triplet Frequencies
We found a universal seven-cluster structure in bacterial genomic sequences and explained its properties. Based on the analysis of 143 completely sequenced bacterial genomes available in GenBank in August 2004, we show that there are four 'pure' types of the seven-cluster structure observed. The type of cluster structure depends on GC content and reflects basic symmetry in triplet frequencies. ...
متن کاملCodon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences
Three results are presented. First, we prove the existence of a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties. The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We...
متن کاملGenetic Diversity and Molecular Phylogeny of Iranian Sheep Based on Cytochrome b Gene Sequences
Phylogenetic relationships and genetic variation between two Iranian sheep breeds were analyzed using cytochrome b (cyt-b) gene sequences. The genomic DNA was isolated by salting out method and amplified cytochrome b gene using polymerase chain reaction restriction (PCR) method with a pair of primer. A partial sequence of cyt-b gene of Iranian sheep is 780 bp and contained 13 variable sites and...
متن کاملطراحی پرایمرهای اختصاصی برای مطالعه تنوع تک نوکلئوتیدی (SNP) در ژن ها و تعیین عملکرد آنها
There is a lot of information about genes sequence but their functions are still unknown. So, to fill the gap between structure and function of these sequences many reverse genetic researches have been done. Current experiment studying, how to design gene-specific primers, that can determine single nucleotide diversity and its impact on gene function.This research was condacted at International...
متن کاملMolecular identification of Dicrocoelium dendriticum using 28s rDNA genomic marker and its histopathologic features in domestic animals in western Iran
Introduction: Dicrocoeliasis is a common disease of bile ducts and gallbladder of domestic and wild ruminants. This disease is caused by different species of dicrocoelium including Dicrocoelium dendriticum. The aim of this study was to identify pathological damages and molecular features associated with this parasite in ruminants. Materials and Methods: In this cross-sectional study, 180 fresh...
متن کامل